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Statistical significance of rich-club phenomena in complex networks 
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We propose that the rich-club phenomena in complex networks should be defined in the spirit 
of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich- 
club detected. Our method can be served as a definition of rich-club phenomenon and is applied to 
analyzing three real networks and three model networks. The results improve significantly compared 
with previously reported results. We report a dilemma with an exceptional example, showing that 
there does not exist an omnipotent definition for the rich-club phenomenon. 
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Almost all social and natural systems are composed of 
a huge number of interacting components. Many self- 
organized features that are absent at the microscopic 
level emerge in complex systems due to the dynamics. 
The topological properties of the underlying network of 
the interacting constituents have great impact on the dy- 
namics of the system evolving on it [E 0, B Most 
complex networks exhibit small- world properties fs^ and 
are scale free in the sense that the distribution of de- 
grees has power-law tails 6] . In addition, many real net- 
works have modular structures or communities express- 
ing their underlying functional modules 7] and exhibit 
self-similar and scale invariant nature in the topology 
B i, [M [H [H, [H, [II- The modular and hierarchi- 
cal structure of social networks may partly account for 
the log-periodic power-law patterns presented extensively 
in financial bubbles and antibubbles [l^, [l^ . A closely 
relevant feature is recently reported in some complex net- 
works, termed the rich-club phenomenon, which however 
lacks a consensus on its definition ^l^, d, [H, S HH, IH . 

The rich-club phenomenon in complex networks di- 
gests the observation that the nodes with high degree 
(called rich nodes) are inclined to intensely connect with 
each other. The average hop distance of the tight group 
is between one and two |18j . Intuitively, rich nodes 
are much more likely to organize into tight and highly- 
interconnected groups (clubs) than low-degree nodes. 
Therefore, it is rational to accept that there is a rich-club 
phenomenon in the topology of internet [13, [H, [H, [13l ■ 
This rationale can be characterized quantitatively by the 
rich-club coefficient 4>, which is expressed as follows [l8| . 



m = 



2E- 



>k 



N>k{N>k ~ 1)' 



(1) 



where Nyk refers to the number of nodes with the de- 
grees higher than a given value k and E^k stands for 
the number of edges among the Nyk nodes. The rich- 
club coefficient (f>{k) is the ratio of the real number to 



the maximally possible number of edges linking the iV>fc 
nodes, which measures how well the rich nodes 'know' 
each other. For example, 0=1 means that the members 
within the club form a full connected network. Indeed, 
4>{k) is nothing but the well-known clustering coefficient 
of the rich club. 

Zhou and Mondragon argue that an increasing function 
(j){k) with respect to k provides evidence for the presence 
of rich-club structure [1^. However, Colizza et al. point 
out that a monotonic increase of (/)(/c) is not enough to 
infer the presence of rich-club phenomenon since even 
random networks generated from the ER model, the MR 
model and the BA model have an increasing (f>{k) with 
respect to k [2l|. Instead, the rich-club coefficient (j){k) 
should be normalized by a reference and the correct null 
model that can serve as a reference is the maximally ran- 
dom networks with the same sequence of degrees as the 
network under investigation (2ll . [2^ . The maximally ran- 
dom networks can be generated with the chain-switching 
method [1^ [l^l . The normalized rich-club coefficient is 
defined by 



p{k) = 0(fc)/0ra„(fc) 



(2) 
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where (pmnik) is the average rich-club coefficient of the 
maximally random networks plj . The actual presence 
of rich-cl ub p henomenon in a network is confirmed if 
p{k) > 1 [2ll [23 |. In this framework, there is no rich- 
club ordering in the network of Internet. 

We have repeated the analysis of Colizza et al [2lj 
for three model networks, namely the Erdos-Renyi (ER) 
model HI, the MoUoy-Reed (MR) model [H, and the 
Barabasi- Albert (BA) model [|, and three real- world 
networks being the protein interaction network ^23^ of the 
yeast Saccharomyces cerevisiae, the scientific collabora- 
tion network collected by Newman [23| and the Internet 
network at the autonomous system level collected by the 
Oregon Route Views project |2j, [2^, [s^]. The rich-club 
coefficients </> of the six networks under investigation are 
presented in Fig. [1] with black circles as a function of the 
percentage g of the richest nodes included in the rich club. 
The ipran functions are also shown for the corresponding 
maximally random networks. We note that when we plot 
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(j){k) versus k, our results for the investigated networks 
are the same as shown in Fig. 1 obtained by CoUzza et 

ai [m. 

Figure [5] shows the p functions, which are not the same 
as those in Fig. 2 presented by Cohzza et al [llj. Specifi- 
cally, we find that the normalized coefficients of the net- 
works of protein interactions, scientific collaborations, 
and the ER model are qualitatively the same as those 
reported by Colizza et al [2l|, while the rest three are 
not. We note that the AS-Internet data was created by 
Mark Newman from data for July 22, 2006. Figure [2] 
shows that the normalized coefficient p is not less than 1 
for the Internet, the MR model, and the BA model. For 
the Internet case, we notice that its cj) is close to 1 for the 
richest nodes. Intuitively, the corresponding (/jj-an should 
be less than 1, which is observed in our analysis but not 
in that of Colizza et al [21]. 

The importance of null model has been emphasized in 
the assessment of sorne properties claimed to be present 
in complex networks [22, [23, l3l| . Other than the simple 
normalization of the rich club coefficient, we argue that 
the correct way to assess the presence of rich club phe- 
nomenon is to perform a statistical test, which amounts 
to determine the probability that the identified rich-club 
phenomenon emerges by chance. The null hypothesis is 
the following: 

Hq: p{g) is not larger than 1. 
The alternative hypothesis is that p{g) > 1. We can com- 
pute the p-value, which is the probability that the null 
hypothesis is true. The smaller the p-value, the stronger 
the evidence against the null hypothesis and favors the 
alternative hypothesis that the presence of rich-club or- 
dering is statistically significant. The p-value is 100% 
when g — 1. By adopting the conventional significance 
level of a = 5%, the rich-club phenomenon is statistically 
significant ii p < a. 

Figure [3] shows the p- values as a function of the per- 
centage g of rich nodes for the networks investigated. For 
the protein interaction network and the ER network, the 
p-values are larger than a when g < 10%. Therefore, 
there is no rich-club ordering in these two networks. For 
the Internet, except for the point at the smallest g and 
the point with g — 0, all p-values are well below a = 5%, 
indicating significant rich-club ordering in the Internet. 
For the scientific collaboration network, the p-values are 
less than a = 5% for most values of g. However, the 
most connected scientists corresponding to small g do 
not form a rich club. According to the top-right panel of 
Fig. [21 the group of these most connected scientists has 
relatively large normalized rich-club coefficient. What is 
the most surprising is that the MR network and the BA 
network have significant rich-club phenomena. 

Among these cases, the presence of rich-club in the In- 
ternet has stirred quite a few debates. In a recent work 
[S^ . Zhou and Mondragon find that there is a clique of 
rich nodes that are completely connected, which is an un- 
doubtable hallmark for the presence of rich club. We can 
put further evidence for this argument. As illustrated in 



Fig. [T] the rich-club coefficients (j) are close to 1 when g is 
small for the Internet, the MR model, and the BA model. 
This means that the richest nodes in these networks are 
almost fully connected. This validates the intuitive def- 
inition that a rich club is a group of nodes with high 
degree that are intensely linked. A statistical test puts 
further credit on the declaration of Zhou and Mondragon 
for the presence of rich club in the Internet. 

A missing ingredient in the discussions of rich-club phe- 
nomenon is the connectedness of the rich club. When 
we define rich nodes as those with for example g > 1% 
and start to investigate whether these nodes form a rich 
club, a scrutiny should be carried out to see if this "club" 
contains several disconnected sub-clubs. As illustrated in 
the upper panel of Fig. [H the scientific collaboration net- 
work are not fully connected for small g. There are sev- 
eral separated clusters for small g. According to Fig. [3l 
all these three subgraphs are rich clubs, which however 
contradicts the common intuition that the members are 
aware of each other forged by other members in the club. 
For g = 0.141, there are two rich clubs (1,4,5,9,11,12) 
and (2,3,6,10,16,20,21,14,15,17). With the increase 
of richness (smaller g or larger fc), the rich club 
(1,4,5,9,11,12) remains unchanged. The second rich 
club (2, 3, 6, 10, 16, 20, 21, 14, 15, 17) splits into two clubs 
(2,3,6,10,16) and (14,15,17) when node 20 and node 
21 are removed for g = 0.111. When g = 0.080, the rich 
club (14, 15, 17) disappears and (2, 3, 6, 10, 16) degener- 
ates to (2,3,6). Therefore, when there are more than 
one isolated clusters of nodes for a given g, we should in- 
vestigate their statistical significance one by one except 
for the trivial cases of isolated nodes and pairs of nodes. 
The lower panel of Fig. [Jj shows the results for the sci- 
entific collaboration network. One observes that p < 5% 
for all clusters. 

So far, we have shown that performing statistical test 
is necessary which does a good job in the detection of 
rich clubs in complex networks. However, a story always 
has two sides. Consider a toy network shown in Fig. [51 
The graph consists of two kinds of nodes identified with 
different colors: The degree of each white node is = 1, 
while the red nodes are very "rich" and fully connected. 
It is evident that the rich-club coefficient of the red nodes 
is0(A: = l) = l and one would say they are within a rich 
club without any doubt. Indeed, a qualitatively same 
figure was taken as an example for the presence of rich 
club [2l|. Surprisingly, this observation of (j){k = 1) does 
not ensure that the red nodes form a rich club in neither 
framework of p > 1 adopted by Colizza et al. [2l[ and 
the statistical test proposed in this work since (jui^ = 
1) = 1 for all maximally random networks. Hence, we 
have p{k = 1) = 1, which means that there is no rich- 
club ordering when k = 1. This conclusion contradicts 
our intuitions. 

We can generalize our discussion above by considering 
a network consists of m rich nodes, which are linked to 
ki, fc2, • ■ • , km nodes of degree fc = 1, respectively. Since 
each node with k = 1 has to be linked to a node with 
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fc > 1 to ensure the connectedness of the randomized 
network, the group of the m rich nodes have X^I^i ^» 
out-edges and Eyi edges among them. The value of £^>i 
does not change for all randomized networks. In other 
words, (j)ran{k = 1) = 4>{k = 1) and p{k = 1) = 1. This 
class of artificial networks invalidates the sophisticated 
approach based on statistical tests. 

The analysis presented here provides a more rigorous 
methodology for detecting rich clubs in complex net- 
works. This allows us to understand this phenomenon 
on a solid basis. However, there exist a class of artificial 
networks with rich clubs on which the methods based on 
null models taking maximally random networks. In this 
sense, the definition of rich-club phenomenon remains an 



open problem. 
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FIG. 1: (Color online) Rich-club coefficient </> as a function of the percentage g of nodes whose degree is larger than k used 
for detecting rich-club ordering. The black cycles are for the networks under investigation and the red squares are for the null 
models. For each simulated model network, the total number of nodes is 10* and its average degree is (k) = 6. The percentage 
of nodes with A; > 1 is g = 77.1% for the protein interaction network, g = 65.9% for the Internet network, g = 86.6% for the 
scientific collaboration network, g = 98.1% for the ER model, and there is no node with k = 1 for the BA and MR networks. 
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FIG. 2: Normalized rich-club coefficients of the investigated networks. The ratio pran ~ 4>/4>Ta,rL as a function of the percentage 
g and compared with the baseline value equal to 1. 
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FIG. 3: Statistical tests for the presence of rich-club phenomena. 




FIG. 4; (Color online) Reassessment of rich-club phenomena for the scientific collaboration network. The upper panel shows 
the subgraphs with g = 0.080% [k = 66), g = 0.111% (fc = 57), and g = 0.141% (fc = 54). The lower panel shows the statistical 
analysis on the sub-clubs for difi'erent g. The blue markers in the left panel shows the rich-club coefficient (j> for all isolated 
sub-clubs with more than two nodes, while the red ones in the same panel are the associated (pmn- The middle panel presents 
the p function and the right panel digests the corresponding p-values. It is observed that p < a for all g. 
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FIG. 5: (Color online) A schematic of illusionary rich-club phenomenon. The big red nodes have larger degrees and form a 

subnetwork which is completely connected. The rich-club coefficient is = 1. The network is disassortative with a Pearson 

coefficient of —0.77. However, all its maximally random networks have 4>{1) = 1, which means that there is no rich-club 
phenomenon statistically. 



